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DETAILED ACTION 

Response to Arguments 

1 . Applicant's arguments with respect to claims 1-19 have been considered but are 
moot in view of the new ground(s) of rejection . 

Claim Rejections - 35 USC § 102 

2. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1 ) an application for patent, published under section 1 22(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

3. Claims 1, 3, 4, 7, 10, 12, 13, 15, 16, 18, and 19 are rejected under 35 
U.S.C. 102(e) as being anticipated by Yoshioka et al. (U.S. Patent 7,149,682). 

In regard to claim 1 , Yoshioka et al. disclose an audio intonation calibration 
method in which an audio signal emitted by a subject (a singer who wants to mimic 
another singer, column 9, lines 65-66) is reproduced to the auditory organs of said 
subject after real time processing (the embodiment is used as a karaoke apparatus that 
allows a user to mimic a particular singer, column 1 1 , lines 9-41 ; such an application 
would inherently require the reproduction to occur in real time, so the output signal 
would remain in tempo with the music), which method is characterized in that is 
comprises the following steps: 
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acquisition of a model audio signal to be imitated (a target singer is provisionally 
analyzed to produce attribute data, column 10, lines 21-30; in one embodiment, the 
target voice is stored and spectral components are extracted in real time, column 23, 
lines 2-9); 

first spectral analysis of said model audio signal (the spectral shape of the target 
singer is stored as attribute data, column 10, lines 21-30; in one embodiment, the target 
voice is stored and spectral components are extracted in real time, column 23, lines 2- 
9); 

acquisition of an imitation audio signal that corresponds to the model audio signal 
and has been imitated by the subject (a singer who wants to mimic another singer 
provides a voice signal, column 9, line 65 to column 10, line 2); 

second spectral analysis of the imitation audio signal (the spectral shape of the 
voice signal is extracted, column 10, lines 14-19); 

comparison of the spectra of the model audio signal and the imitation audio 
signal (corresponding frames of the input voice signal and target signal are taken, 
column 10, lines 21-30); 

correction of the imitation audio signal as a function of the result of said 
comparison (the corresponding frames are combined to correct the mimicking singer's 
voice to those of the target signal, column 10, lines 33-42 and column 1 1 , lines 3-5); 

wherein at least the spectral analysis step, the comparison step, and the 
correction step are carried out in real time and constitute the real time processing 
(Yoshika et al. explicitly disclose the spectral analysis step is performed in real time, 
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column 9, line 65 to column 10, line 2; furthermore, Yoshika et al. disclose the 
processing takes place in a karaoke type system, wherein a song sung by a mimicking 
singer is adjusted to be more like a target singer and output along a karaoke 
accompaniment, column 23, lines 16-20; thus, all the processing steps of Yoshika et al. 
would have to inherently be "real-time" processing steps; if the processing was not done 
in real-time, the mimicking singer's voice would be out of sync with the karaoke 
accompaniment); and 

after real time processing, reproduction to the auditory organs of the subject of 
the corrected imitation audio signal (the song sung by the mimicking singer is output, 
column 1 0, line 67 to column 1 1 , line 6 and column 23, lines 1 6-20). 

In regard to claim 3, Yoshika et al. disclose the comparison steps and correction 
steps are executed over a series of frequency bands in the range of audible frequencies 
(both steps are performed on voice signals, column 10, lines 21-30 and column 11, lines 
3-5, see also Figs. 5 and 8, thus the processing must inherently occur "in the range of 
audible frequencies"). 

In regard to claim 4, Yoshika et al. disclose the series of frequency bands 
corresponds to a subdivision of the range of audible frequencies (see Fig. 5, plurality of 
frequencies F0-FN, column 11, line 62 to column 12, line 8). 



Application/Control Number: 10/634,744 Page 5 

Art Unit: 2626 

In regard to claim 7, Yoshika et al. disclose a step of memorizing the spectral 
analysis of said model audio signal to be imitated (the attributes of a target voice are 
previously stored, column 10, lines 21-30). 

In regard to claim 10, Yoshika et al. disclose the model audio signal to be 
imitated is a song (a singer to be mimicked, column 9, line 65 to column 10, line 2) and 
in that said method further includes, simultaneously with the step (E62) of reproducing 
the corrected audio signal to the auditory organs of the subject (S), a step (E62) of 
emitting an accompaniment signal of said song to the auditory organs of the subject (S) 
(the adjusted output of the mimicking singer is output like that of the target singer along 
with a karaoke accompaniment, i.e. the song, column 23, lines 16-20). 

In regard to claim 12, Yoshika et al. disclose a method of performance of a song 
by a subject (S), in which method an audio signal emitted by a subject (S) is reproduced 
to the auditory organs of the subject after real time processing, and which method is 
characterized in that it uses an audio-intonation calibration method according to claim 1 
(the adjusted output of the mimicking singer is output like that of the target singer along 
with a karaoke accompaniment, i.e. the song, column 23, lines 16-20; all the processing 
steps of Yoshika et al. would have to inherently be "real-time" processing steps; if the 
processing was not done in real-time, the mimicking singer's voice would be out of sync 
with the karaoke accompaniment). 
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In regard to claim 13, Yoshika et al. disclose fixed or removable information 
storage means, characterized in that said means contain software code portions 
adapted to execute the steps of an audio-intonation calibration method according to 
claim 1 (hard disks, CD-ROM's etc., column 42, lines 25-35). 

In regard to claim 15, Yoshika et al. disclose fixed or removable information 
storage means, characterized in that said means contain software code portions 
adapted to execute the steps of the method according to claim 12 of performing a song 
(hard disks, CD-ROM's etc., column 42, lines 25-35). 

In regard to claim 16, Yoshika et al. disclose during the correction step, each 
frequency band of the imitation audio signal is corrected so that an intensity value of the 
imitation audio signal is corrected so that an intensity value of the imitation audio signal 
is corrected so that an intensity value of the imitation audio signal in the respective band 
corresponds to an intensity value of the model audio signal in the respective band (Fig. 
8(d), a plurality of amplitudes An are each associated with frequency components Fn 
and each amplitude An is adjusted to match the target data element, column 19, line 60 
to column 20, line 35). 

In regard to claim 18, Yoshika et al. disclose the first spectral analysis step 
includes dividing the model audio signal into a multiplicity of frequency bands and 
determining an intensity of the model audio signal in each of the frequency bands (in 
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one embodiment, the target voice is stored and spectral components are extracted in 
real time, column 23, lines 2-9; the voice signal is applied to an FFT and frequency and 
amplitude pairs are extracted, column 11, line 62 to column 12, line 8), wherein the 
second spectral analysis step includes dividing the imitation audio signal into same 
frequency bands as the first spectral analysis step and determining an intensity of the 
imitation audio signal in each of the frequency bands (the input voice is subjected to the 
same FFT analysis as the target signal, above, column 1 1 , line 62 to column 1 2, line 8), 
wherein the comparison step includes, for each of the frequency bands, comparing the 
intensity of the model audio signal to the intensity of the imitation audio signal, and 
wherein the correction step includes correcting the imitation audio signal so that, for 
each of the frequency bands, an intensity of the corrected imitation audio signal 
corresponds to the intensity of the model audio signal (Fig. 8(d), a plurality of 
amplitudes An are each associated with frequency components Fn and each amplitude 
An is adjusted to match the target data element, column 19, line 60 to column 20, line 
35). 

In regard to claim 19, Yoshioka et al. disclose an audio intonation calibration 
method in which an audio signal emitted by a subject (a singer who wants to mimic 
another singer, column 9, lines 65-66) is reproduced to the auditory organs of said 
subject after real time processing (the embodiment is used as a karaoke apparatus that 
allows a user to mimic a particular singer, column 1 1 , lines 9-41 ; such an application 
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would inherently require the reproduction to occur in real time, so the output signal 
would remain in tempo with the music), the method comprising the steps of: 

acquiring a model audio signal that is to be imitated by the subject (a target 
singer is provisionally analyzed to produce attribute data, column 10, lines 21-30; in one 
embodiment, the target voice is stored and spectral components are extracted in real 
time, column 23, lines 2-9); 

first spectral analysis of said model audio signal (the spectral shape of the target 
singer is stored as attribute data, column 10, lines 21-30; in one embodiment, the target 
voice is stored and spectral components are extracted in real time, column 23, lines 2-9) 
including dividing the model audio signal into a multiplicity of frequency bands and 
determining an intensity of the model audio signal in each of the intensity bands (in one 
embodiment, the target voice is stored and spectral components are extracted in real 
time, column 23, lines 2-9; the voice signal is applied to an FFT and frequency and 
amplitude pairs are extracted, column 11, line 62 to column 12, line 8); 

emitting, by the subject, an imitation audio signal that corresponds to the model 
audio signal (a singer who wants to mimic another singer provides a voice signal, 
column 9, line 65 to column 10, line 2); 

performing a second spectral analysis of the imitation audio signal (the spectral 
shape of the voice signal is extracted, column 1 0, lines 1 4-1 9) including dividing the 
imitation audio signal into same frequency bands as in the first spectral analysis step 
and determining an intensity of the imitation audio signal in each of the frequency bands 
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(the input voice is subjected to the same FFT analysis as the target signal, above, 
column 1 1 , line 62 to column 1 2, line 8); 

comparing, for each of the frequency bands, the intensity of the model audio 
signal to the intensity of the imitation audio signal (corresponding frames of the input 
voice signal and target signal are taken, column 10, lines 21-30); 

correcting the imitation audio signal as a function of the comparison step so that, 
for each of the frequency bands, an intensity of a corrected imitation audio signal 
corresponds to the intensity of the model audio signal (Fig. 8(d), a plurality of 
amplitudes An are each associated with frequency components Fn and each amplitude 
An is adjusted to match the target data element, column 19, line 60 to column 20, line 
35); 

wherein at least the second spectral analysis step, the comparing step, and the 
correcting step are carried out in real time and constitute the real time processing 
(Yoshika et al. explicitly disclose the spectral analysis step is performed in real time, 
column 9, line 65 to column 10, line 2; furthermore, Yoshika et al. disclose the 
processing takes place in a karaoke type system, wherein a song sung by a mimicking 
singer is adjusted to be more like a target singer and output along a karaoke 
accompaniment, column 23, lines 16-20; thus, all the processing steps of Yoshika et al. 
would have to inherently be "real-time" processing steps; if the processing was not done 
in real-time, the mimicking singer's voice would be out of sync with the karaoke 
accompaniment); and 
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after real time processing, reproducing to the auditory organs of the subject of 
the corrected imitation audio signal (the song sung by the mimicking singer is output, 
column 1 0, line 67 to column 1 1 , line 6 and column 23, lines 1 6-20). 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claim 2 is rejected under 35 U.S.C. 103(a) as being unpatentable over Yoshika 
et al., in view of Gibson et al. (U.S. Patent 6,336,092). 

In regard to claim 2, Yoshika et al. do not disclose correction of the dynamic 
range of the corrected audio signal. 

Gibson et al. disclose a method for converting a source voice signal to a target 
voice signal, wherein the method includes: 

measurement of the dynamic range of the audio signal imitated by the subject 
(the level of a source vocal signal is measured, column 10, line 18); 

measurement of the dynamic range of the corrected audio signal (the level of a 
corrected source vocal signal to which a spectral envelope has been applied is 
measured, column 10, lines 20-21); 
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comparison of the dynamic range of the imitation audio signal and the corrected 
audio signal (the input levels are compared to calculate the final output level, column 

10, lines 23-25); and 

correction of the dynamic range of the corrected audio signal as a function of the 
result of said comparison before reproduction to the auditory organs of the subject of 
the corrected audio signal (prior to output, see Fig. 3, the amplitude envelope of the 
source vocal signal is applied to the corrected audio signal, column 10, lines 14-45). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Yoshika et al. to correct the dynamic range of the corrected signal 
as a function of the comparison, because this would make the corrected (transformed) 
vocal track the amplitude of the source vocal, as taught by Gibson et al. (column 10, 
lines 14-16). 

6. Claim 5 is rejected under 35 U.S.C. 103(a) as being unpatentable over Yoshika 
et al., in view of Savic et al. (U.S. Patent 5,327,521 ). 

In regard to claim 5, Yoshika et al. do not provide a specific number of frequency 
bands (frequency bands are generically referred to as frequencies F0-FN, see column 

11, line 62 to column 12, line 3). 

Savic et al. disclose a method for converting a source voice signal to a target 
voice signal, wherein the method includes analyzing the range of audible frequencies by 
dividing the range of audible frequencies into at least 50 frequency bands (an FFT is 
performed with a length N=512, i.e. 51 2 frequency bands, column 7, lines 19-21). 
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It would have been obvious to one of ordinary skill in the art at the time of 
invention to divide the range of audio frequencies into at least 50 bands, because as the 
number of frequency band increased, the accuracy of the spectral analysis would 
increase, thus the correction to the target signal's spectral shape would be more 
accurate. 

7. Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over Yoshika 
et al., in view of Ojard (U.S. Patent 5,966,687). 

In regard to claim 6, Yoshika et al. disclose a karaoke application, but do not 
disclose displaying text. 

Ojard disclose a typical karaoke application wherein the model audio signal to be 
imitated (i.e. the song to be sung) is a text and disclose displaying the text (column 1 , 
line 26-29). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Yoshika et al. to display the text of the model audio signal to be 
imitated (i.e. the target voice signal to be mimicked), because this would aid the subject 
in singing the correct lyrics that would match the target voice signal. 

8. Claims 8 and 9 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Yoshika et al., in view of Jeong (U.S. Patent 5,873,728), in further view of Cave et al. 
(U.S. Patent 5,362,240). 
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In regard to claim 8, Yoshika et al. do not disclose emitting the model signal 
before acquiring the imitation audio signal emitted by the subject. 

Jeong discloses a method of practicing a language being studied wherein a 
model signal to be imitated is emitted to the auditory organs of a subject before the step 
of acquiring the imitation audio signal emitted by the subject (Fig. 2A, step S2, recording 
of a model voice signal is output to the user, column 2, lines 46-58; prior to the speaker 
emulating the reproduced sound, column 3, lines 4-7). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Jeong to emit the model signal prior to acquiring the imitation audio 
signal, so that the subject would know what was to be imitated. 

Furthermore, as to whether one of ordinary skill in the art would combine the 
audio-intonation calibration technique of Yoshika et al. (which is primarily applied in a 
karaoke application) to a method of practicing a language being studied, Cave et al. 
provide ample motivation. Specifically, Cave et al. disclose a method of practicing a 
language being studied, in with an audio signal is emitted by a subject is reproduced to 
the auditory organs of the subject after real time processing and modified according to a 
language being studied (see Abstract). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to apply the audio intonation calibration technique of claim 1 and disclosed by 
Yoshika et al. to the field of practicing a language being studied, because people 
progress more rapidly in learning a language if they can hear their own voice modified 
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by a process to take the characteristics of the foreign language into account, as taught 
by Cave et al. (column 1 , lines 1 2-21 ). 

In regard to claim 9, Yoshika et al. do not disclose emitting the model signal 
before acquiring the imitation audio signal emitted by the subject. 

Jeong discloses a method of practicing a language being studied wherein a 
model signal to be imitated is emitted to the auditory organs of a subject before the step 
of acquiring the imitation audio signal emitted by the subject (Fig. 2A, step S2, recording 
of a model voice signal is output to the user, column 2, lines 46-58; prior to the speaker 
emulating the reproduced sound, column 3, lines 4-7). 

Yoshika et al. and Jeong do not disclose modifying the audio signal to be 
imitated as a function of parameters representative of the language being studied. 

Cave et al. disclose modifying an audio signal as a function of parameters 
representative of the language being studied (an audio signal is equalized according to 
the language being learned, column 3, lines 26-40). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention, therefore, to modify the model signal output to the user prior to the step of 
acquiring the imitation audio signal emitted by the subject as a function of parameters 
representative of the language being studied, because this would emphasize the more 
important aspects of the model signal to be imitated by the user, thus helping the user's 
pronunciation. 
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9. Claims 11, 14, and 17 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Yoshika et al., in view of Cave et al. 

In regard to claim 1 1 , Yoshika et al. disclose an audio signal emitted by a subject 
is reproduced to the auditory organs of the subject after real time processing, and which 
method is characterized in that it uses an audio intonation calibration method according 
to claim 1 (the adjusted output of the mimicking singer is output like that of the target 
singer along with a karaoke accompaniment, i.e. the song, column 23, lines 16-20; all 
the processing steps of Yoshika et al. would have to inherently be "real-time" 
processing steps; if the processing was not done in real-time, the mimicking singer's 
voice would be out of sync with the karaoke accompaniment). 

Yoshika et al. do not disclose the method is a method of practicing a language 
being studied. 

Cave et al. disclose a method of practicing a language being studied, in with an 
audio signal is emitted by a subject is reproduced to the auditory organs of the subject 
after real time processing and modified according to a language being studied (see 
Abstract). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to apply the audio intonation calibration technique of claim 1 and disclosed by 
Yoshika et al. to the field of practicing a language being studied, because people 
progress more rapidly in learning a language if they can hear their own voice modified 
by a process to take the characteristics of the foreign language into account, as taught 
by Cave et al. (column 1 , lines 1 2-21 ). 
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In regard to claim 14, Yoshika et al. disclose fixed or removable information 
storage means, characterized in that said means contain software code portions 
adapted to execute the steps of an audio-intonation calibration method according to 
claim 1 (hard disks, CD-ROM's etc., column 42, lines 25-35). 

Yoshika et al. do not disclose the method is a method of practicing a language 
being studied. 

Cave et al. disclose a method of practicing a language being studied, in with an 
audio signal is emitted by a subject is reproduced to the auditory organs of the subject 
after real time processing and modified according to a language being studied (see 
Abstract). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to apply the audio intonation calibration technique of claim 1 and disclosed by 
Yoshika et al. to the field of practicing a language being studied, because people 
progress more rapidly in learning a language if they can hear their own voice modified 
by a process to take the characteristics of the foreign language into account, as taught 
by Cave et al. (column 1 , lines 1 2-21 ). 

In regard to claim 17, Yoshika et al. do not disclose reproducing the corrected 
imitation audio signal in headphones on the auditory organs of the subject. 
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Cave et al. disclose reproducing the corrected imitation audio signal in 
headphones on the auditory organs of the subject (earphones 25, column 4, lines 27- 
35). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Yoshika et al. to reproduce the corrected imitation audio signal in 
headphones on the auditory organs of the subject, because headphones would allow 
the user to hear their own corrected voice more accurately. 



Conclusion 

1 0. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to BRIAN L. ALBERTALLI whose telephone number is 
(571 )272-7616. The examiner can normally be reached on Monday-Thursday, 8 AM to 
6:30 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on (571) 272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/David R Hudspeth/ 

Supervisory Patent Examiner, Art Unit 2626 

BLA 10/20/08 



